skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Cohn, Clayton"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Although the “eye-mind link” hypothesis posits that eye movements provide a direct window into cognitive processing, linking eye movements to specific cognitions in real-world settings remains challenging. This challenge may arise because gaze metrics such as fixation duration, pupil size, and saccade amplitude are often aggregated across timelines that include heterogeneous events. To address this, we tested whether aggregating gaze parameters across participant-defined events could support the hypothesis that increased focal processing, indicated by greater gaze duration and pupil diameter, and decreased scene exploration, indicated by smaller saccade amplitude, would predict effective task performance. Using head-mounted eye trackers, nursing students engaged in simulation learning and later segmented their simulation footage into meaningful events, categorizing their behaviors, task outcomes, and cognitive states at the event level. Increased fixation duration and pupil diameter predicted higher student-rated teamwork quality, while increased pupil diameter predicted judgments of effective communication. Additionally, increased saccade amplitude positively predicted students’ perceived self-efficacy. These relationships did not vary across event types, and gaze parameters did not differ significantly between the beginning, middle, and end of events. However, there was a significant increase in fixation duration during the first five seconds of an event compared to the last five seconds of the previous event, suggesting an initial encoding phase at an event boundary. In conclusion, event-level gaze parameters serve as valid indicators of focal processing and scene exploration in natural learning environments, generalizing across event types. 
    more » « less
  2. This paper explores the use of large language models (LLMs) to score and explain short-answer assessments in K-12 science. While existing methods can score more structured math and computer science assessments, they often do not provide explanations for the scores. Our study focuses on employing GPT-4 for automated assessment in middle school Earth Science, combining few-shot and active learning with chain-of-thought reasoning. Using a human-in-the-loop approach, we successfully score and provide meaningful explanations for formative assessment responses. A systematic analysis of our method's pros and cons sheds light on the potential for human-in-the-loop techniques to enhance automated grading for open-ended science assessments. 
    more » « less
  3. LLMs have demonstrated proficiency in contextualizing their outputs using human input, often matching or beating human-level performance on a variety of tasks. However, LLMs have not yet been used to characterize synergistic learning in students’ collaborative discourse. In this exploratory work, we take a first step towards adopting a human-in-the-loop prompt engineering approach with GPT-4-Turbo to summarize and categorize students’ synergistic learning during collaborative discourse. Our preliminary findings suggest GPT-4-Turbo may be able to characterize students’ synergistic learning in a manner comparable to humans and that our approach warrants further investigation. 
    more » « less
  4. This research explores a novel human-in-the-loop approach that goes beyond traditional prompt engineering approaches to harness Large Language Models (LLMs) with chain-of-thought prompting for grading middle school students’ short answer formative assessments in science and generating useful feedback. While recent efforts have successfully applied LLMs and generative AI to automatically grade assignments in secondary classrooms, the focus has primarily been on providing scores for mathematical and programming problems with little work targeting the generation of actionable insight from the student responses. This paper addresses these limitations by exploring a human-in-the-loop approach to make the process more intuitive and more effective. By incorporating the expertise of educators, this approach seeks to bridge the gap between automated assessment and meaningful educational support in the context of science education for middle school students. We have conducted a preliminary user study, which suggests that (1) co-created models improve the performance of formative feedback generation, and (2) educator insight can be integrated at multiple steps in the process to inform what goes into the model and what comes out. Our findings suggest that in-context learning and human-in-the-loop approaches may provide a scalable approach to automated grading, where the performance of the automated LLM-based grader continually improves over time, while also providing actionable feedback that can support students’ open-ended science learning. 
    more » « less
  5. Martin Fred; Norouzi, Narges; Rosenthal, Stephanie (Ed.)
    This paper examines the use of LLMs to support the grading and explanation of short-answer formative assessments in K12 science topics. While significant work has been done on programmatically scoring well-structured student assessments in math and computer science, many of these approaches produce a numerical score and stop short of providing teachers and students with explanations for the assigned scores. In this paper, we investigate few-shot, in-context learning with chain-of-thought reasoning and active learning using GPT-4 for automated assessment of students’ answers in a middle school Earth Science curriculum. Our findings from this human-in-the-loop approach demonstrate success in scoring formative assessment responses and in providing meaningful explanations for the assigned score. We then perform a systematic analysis of the advantages and limitations of our approach. This research provides insight into how we can use human-in-the-loop methods for the continual improvement of automated grading for open-ended science assessments. 
    more » « less
  6. Grieff, S. (Ed.)
    Recently there has been increased development of curriculum and tools that integrate computing (C) into Science, Technology, Engineering, and Math (STEM) learning environments. These environments serve as a catalyst for authentic collaborative problem-solving (CPS) and help students synergistically learn STEM+C content. In this work, we analyzed students’ collaborative problem-solving behaviors as they worked in pairs to construct computational models in kinematics. We leveraged social measures, such as equity and turn-taking, along with a domain-specific measure that quantifies the synergistic interleaving of science and computing concepts in the students’ dialogue to gain a deeper understanding of the relationship between students’ collaborative behaviors and their ability to complete a STEM+C computational modeling task. Our results extend past findings identifying the importance of synergistic dialogue and suggest that while equitable discourse is important for overall task success, fluctuations in equity and turn-taking at the segment level may not have an impact on segment-level task performance. To better understand students’ segment-level behaviors, we identified and characterized groups’ planning, enacting, and reflection behaviors along with monitoring processes they employed to check their progress as they constructed their models. Leveraging Markov Chain (MC) analysis, we identified differences in high- and low-performing groups’ transitions between these phases of students’ activities. We then compared the synergistic, turn-taking, and equity measures for these groups for each one of the MC model states to gain a deeper understanding of how these collaboration behaviors relate to their computational modeling performance. We believe that characterizing differences in collaborative problem-solving behaviors allows us to gain a better understanding of the difficulties students face as they work on their computational modeling tasks. 
    more » « less
  7. Wang, N. (Ed.)
    In education, intelligent learning environments allow students to choose how to tackle open-ended tasks while monitoring performance and behavior, allowing for the creation of adaptive support to help students overcome challenges. Timely feedback is critical to aid students’ progression toward learning and improved problem-solving. Feedback on text-based student responses can be delayed when teachers are overloaded with work. Automated evaluation can provide quick student feedback while easing the manual evaluation burden for teachers in areas with a high teacher-to-student ratio. Current methods of evaluating student essay responses to questions have included transformer-based natural language processing models with varying degrees of success. One main challenge in training these models is the scarcity of data for student-generated data. Larger volumes of training data are needed to create models that perform at a sufficient level of accuracy. Some studies have vast data, but large quantities are difficult to obtain when educational studies involve student-generated text. To overcome this data scarcity issue, text augmentation techniques have been employed to balance and expand the data set so that models can be trained with higher accuracy, leading to more reliable evaluation and categorization of student answers to aid teachers in the student’s learning progression. This paper examines the text-generating AI model, GPT-3.5, to determine if prompt-based text-generation methods are viable for generating additional text to supplement small sets of student responses for machine learning model training. We augmented student responses across two domains using GPT-3.5 completions and used that data to train a multilingual BERT model. Our results show that text generation can improve model performance on small data sets over simple self-augmentation. 
    more » « less
  8. Jovanovic, Jelena; Chounta, Irene-Angelica; Uhomoibhi, James; McLaren, Bruce (Ed.)
    Computer-supported education studies can perform two important roles. They can allow researchers to gather important data about student learning processes, and they can help students learn more efficiently and effectively by providing automatic immediate feedback on what the students have done so far. The evaluation of student work required for both of these roles can be relatively easy in domains like math, where there are clear right answers. When text is involved, however, automated evaluations become more difficult. Natural Language Processing (NLP) can provide quick evaluations of student texts. However, traditional neural network approaches require a large amount of data to train models with enough accuracy to be useful in analyzing student responses. Typically, educational studies collect data but often only in small amounts and with a narrow focus on a particular topic. BERT-based neural network models have revolutionized NLP because they are pre-trained on very large corpora, developing a robust, contextualized understanding of the language. Then they can be “fine-tuned” on a much smaller set of data for a particular task. However, these models still need a certain base level of training data to be reasonably accurate, and that base level can exceed that provided by educational applications, which might contain only a few dozen examples. In other areas of artificial intelligence, such as computer vision, model performance on small data sets has been improved by “data augmentation” — adding scaled and rotated versions of the original images to the training set. This has been attempted on textual data; however, augmenting text is much more difficult than simply scaling or rotating images. The newly generated sentences may not be semantically similar to the original sentence, resulting in an improperly trained model. In this paper, we examine a self-augmentation method that is straightforward and shows great improvements in performance with different BERT-based models in two different languages and on two different tasks that have small data sets. We also identify the limitations of the self-augmentation procedure. 
    more » « less
  9. AbstractRecent advances in generative artificial intelligence (AI) and multimodal learning analytics (MMLA) have allowed for new and creative ways of leveraging AI to support K12 students' collaborative learning in STEM+C domains. To date, there is little evidence of AI methods supporting students' collaboration in complex, open‐ended environments. AI systems are known to underperform humans in (1) interpreting students' emotions in learning contexts, (2) grasping the nuances of social interactions and (3) understanding domain‐specific information that was not well‐represented in the training data. As such, combined human and AI (ie, hybrid) approaches are needed to overcome the current limitations of AI systems. In this paper, we take a first step towards investigating how a human‐AI collaboration between teachers and researchers using an AI‐generated multimodal timeline can guide and support teachers' feedback while addressing students' STEM+C difficulties as they work collaboratively to build computational models and solve problems. In doing so, we present a framework characterizing the human component of our human‐AI partnership as a collaboration between teachers and researchers. To evaluate our approach, we present our timeline to a high school teacher and discuss the key insights gleaned from our discussions. Our case study analysis reveals the effectiveness of an iterative approach to using human‐AI collaboration to address students' STEM+C challenges: the teacher can use the AI‐generated timeline to guide formative feedback for students, and the researchers can leverage the teacher's feedback to help improve the multimodal timeline. Additionally, we characterize our findings with respect to two events of interest to the teacher: (1) when the students cross adifficulty threshold,and (2) thepoint of intervention, that is, when the teacher (or system) should intervene to provide effective feedback. It is important to note that the teacher explained that there should be a lag between (1) and (2) to give students a chance to resolve their own difficulties. Typically, such a lag is not implemented in computer‐based learning environments that provide feedback. Practitioner notesWhat is already known about this topicCollaborative, open‐ended learning environments enhance students' STEM+C conceptual understanding and practice, but they introduce additional complexities when students learn concepts spanning multiple domains.Recent advances in generative AI and MMLA allow for integrating multiple datastreams to derive holistic views of students' states, which can support more informed feedback mechanisms to address students' difficulties in complex STEM+C environments.Hybrid human‐AI approaches can help address collaborating students' STEM+C difficulties by combining the domain knowledge, emotional intelligence and social awareness of human experts with the general knowledge and efficiency of AI.What this paper addsWe extend a previous human‐AI collaboration framework using a hybrid intelligence approach to characterize the human component of the partnership as a researcher‐teacher partnership and present our approach as a teacher‐researcher‐AI collaboration.We adapt an AI‐generated multimodal timeline to actualize our human‐AI collaboration by pairing the timeline with videos of students encountering difficulties, engaging in active discussions with a high school teacher while watching the videos to discern the timeline's utility in the classroom.From our discussions with the teacher, we define two types ofinflection pointsto address students' STEM+C difficulties—thedifficulty thresholdand theintervention point—and discuss how thefeedback latency intervalseparating them can inform educator interventions.We discuss two ways in which our teacher‐researcher‐AI collaboration can help teachers support students encountering STEM+C difficulties: (1) teachers using the multimodal timeline to guide feedback for students, and (2) researchers using teachers' input to iteratively refine the multimodal timeline.Implications for practice and/or policyOur case study suggests that timeline gaps (ie, disengaged behaviour identified by off‐screen students, pauses in discourse and lulls in environment actions) are particularly important for identifying inflection points and formulating formative feedback.Human‐AI collaboration exists on a dynamic spectrum and requires varying degrees of human control and AI automation depending on the context of the learning task and students' work in the environment.Our analysis of this human‐AI collaboration using a multimodal timeline can be extended in the future to support students and teachers in additional ways, for example, designing pedagogical agents that interact directly with students, developing intervention and reflection tools for teachers, helping teachers craft daily lesson plans and aiding teachers and administrators in designing curricula. 
    more » « less